Anomaly Detection on D-root (Abstract)
نویسندگان
چکیده
DNS root name servers play a crucial role in the Internet operation. Detecting and identifying anomalous activities around root servers is a critical task for network operators. It is not hard to “detect” the huge attacks [1], but how do we detect more than just the strongest, most extreme signals? How can we go about extracting, studying and understanding the smaller (but still nontrivial) anomalous events? These events might be from leakage traffic from botnet activities, throw-away traffic from misconfigured resolvers, or traffic load changes due to route issues, etc. To detect all these events requires one to effectively extract anomalous patterns from massive multidimensional measurements. We present initial work towards detecting and identifying anomalous activities on DNS root servers. The method is based on Principal Component Analysis (PCA) to separate DNS traffic measurements into disjoint subspaces corresponding to normal and anomalous network behaviors. We have performed a preliminary analysis using data from D-root servers operated by UMD, and identified the detected anomalies by manual inspection. Many techniques have been proposed to detect anomalies in network traffic, but they mainly focus on volume anomalies [2–4]. However, DNS traffic provides us more features, like query names, query types and DNSSEC queries. There has been extensive works on DNS traffic analysis focus on specific anomalous activities, such as botnet’s traffic [5–7] (domain-flux or fast-flux traffic) or DoS traffic [8, 9]. We aim to detect general anomalous traffic on root servers, that potentially relates to attacks, misconfigured resolvers, routing issues, and so on. The dataset used in our study consists of sampled DNS queries and responses collected from the 98 anycast sites of D-root. On each site, we aggregate both queries and responses for each one-hour time period, and compute desired measurements. The one-hour time period is used as a tradeoff between the amount of data to be processed in each period and the granularity of anomalies to be detected. We focus on the following measurements: (1) Query number per second (QPS); (2) Source address number per second; (3) Query diversity: number of unique query names over number of queries; (4) Source address entropy: the entropy of the query number distribution among all source addresses, which describes the degree of concentration of the query distribution. Given one of the measurements and a time interval T , we construct a T × p measurement matrix X, where p is the number of anycast sites. We then apply our detection method to the measurement matrix. The idea of PCA-based anomaly detection is to identify typical variations among measurements and detect anomalous deviation from the typical variations [2]. Given the measurement matrix X, we apply PCA to the covariance matrix of X to compute a set of principal components {vi}pi=1 that captures the variance among X. Then we select the first m p principal components {vi}i=1 to construct the normal subspace, in which the majority of the variation is captured; the rest of the components constructs the residual subspace. When a new observation y comes in, it is then decomposed onto normal (ŷ) and residual (ỹ) subspaces, i.e., y = ŷ + ỹ. The energy of ỹ (i.e., ||ỹ||) describes the degree of deviation from normal variation, thus statistical tests [10] can be applied to this energy to test if the observation is anomalous. To handle time-dependent (diurnal and weekly) patterns in the DNS traffic, we construct matrix X using measurements from the one week-long time window prior to the new observation. We update X only if the new observation is not anomalous, and rerun the principal components analysis accordingly. We applied our method to all the measurements mentioned above, but here we show initial results of anomalies based on QPS. Among QPS measurements from all of D-root’s anycast sites throughout 2015, we detected 136 hour-long time periods when anomalies happened. In order to verify and identify these anomalies, we manually inspected them. We focused on identifying significant patterns in QPS, query diversity and source address entropy during the anomalies. The anomalies are classified into four types based on the patterns. 62 anomalies are classified as “botnet activities”, as they included huge traffic volume increase and clear malicious query name patterns that are related to DoS attacks or algorithmically generated domains, such as [nonce] + “.ts8899.net〈20〉”. 18 anomalies had high volume traffic with query names potentially relate to bugs or faults in resolvers, such as “www.”, “http.” and “.”. There are 22 “traffic switch” anomalies, when the traffic volume decreased on several replicas, but increased on others. And 21 “traffic drop” anomalies showed significant volume decrease on some replicas, but had no corresponding increase on others. The rest 13 anomalies could not be classified into any of the above. For the 62 periods with “botnet activities” detected in our dataset, D-root observed about 7 billion suspicious queries in total; by comparison, approximately 18 billion attack queries were observed during the widely publicized DDoS attack on A-root on Dec.1, 2015 [11]. With the series of botnet activities, we can profile their behaviors and track their evolution. Anomalies with buggy queries
منابع مشابه
Anomaly Detection Using SVM as Classifier and Decision Tree for Optimizing Feature Vectors
Abstract- With the advancement and development of computer network technologies, the way for intruders has become smoother; therefore, to detect threats and attacks, the importance of intrusion detection systems (IDS) as one of the key elements of security is increasing. One of the challenges of intrusion detection systems is managing of the large amount of network traffic features. Removing un...
متن کاملMining Anomaly using Association Rule
9 ABSTRACT In a world where critical equipments are connected to internet, hence protection against professional cyber criminals is important. Today network security, uptime and performance of network are important and serious issue in computer network. Anomaly is deviation from normal behavior which is factor that affects on network security. So Anomaly Extraction which detects and extracts an...
متن کاملBerkay Kicanaoglu Unsupervised Anomaly Detection in Unstructured Log-data for Root-cause-analysis
BERKAY KICANAOGLU: Unsupervised Anomaly Detection in unstructured log-data for root-cause-analysis Tampere University of Technology Master's Thesis, 64 pages, 0 Appendix pages April 2015 Master's Degree Programme in Information Technology Major: Signal Processing Examiner: Prof. Moncef Gabbouj
متن کاملMoving dispersion method for statistical anomaly detection in intrusion detection systems
A unified method for statistical anomaly detection in intrusion detection systems is theoretically introduced. It is based on estimating a dispersion measure of numerical or symbolic data on successive moving windows in time and finding the times when a relative change of the dispersion measure is significant. Appropriate dispersion measures, relative differences, moving windows, as well as tec...
متن کاملDetection of Mo geochemical anomaly in depth using a new scenario based on spectrum–area fractal analysis
Detection of deep and hidden mineralization using the surface geochemical data is a challenging subject in the mineral exploration. In this work, a novel scenario based on the spectrum–area fractal analysis (SAFA) and the principal component analysis (PCA) has been applied to distinguish and delineate the blind and deep Mo anomaly in the Dalli Cu–Au porphyry mineralization area. The Dalli miner...
متن کاملSENATUS: An Approach to Joint Traffic Anomaly Detection and Root Cause Analysis
In this paper, we propose a novel approach, called SENATUS, for joint traffic anomaly detection and root-cause analysis. Inspired from the concept of a senate, the key idea of the proposed approach is divided into three stages: election, voting and decision. At the election stage, a small number of senator flows are chosen to represent approximately the total (usually huge) set of traffic flows...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016